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Abstract 

Background: Small nucleolar RNAs (snoRNAs) are a class of non-coding RNAs that guide the modification of 
specific nucleotides in ribosomal RNAs (rRNAs) and small nuclear RNAs (snRNAs). Although most non-coding RNAs 
undergo post-transcriptional modifications prior to maturation, the functional significance of these modifications 
remains unknown. Here, we introduce the snoRNA orthological gene database (snOPY) as a tool for studying RNA 
modifications. 

Findings: snOPY provides comprehensive information about snoRNAs, snoRNA gene loci, and target RNAs. It also 
contains data for orthologues from various species, which enables users to analyze the evolution of snoRNA genes. 
In total, 13,770 snoRNA genes, 10,345 snoRNA gene loci, and 133 target RNAs have been registered. Users can 
search and access the data efficiently using a simple web interface with a series of internal links. snOPY is freely 
available on the web at http://snoopy.med.miyazaki-u.ac.jp. 

Conclusions: snOPY is the database that provides information about the small nucleolar RNAs and their 
orthologues. It will help users to study RNA modifications and snoRNA gene evolution. 
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Findings 

Background 

Large-scale sequencing and transcriptome analyses have 
revealed that most of the genome is transcribed and that 
there are a large number of non-protein-coding tran- 
scripts present in the cell [1]. Functional non-coding 
RNAs (ncRNAs) include micro RNAs (miRNAs), short 
interfering RNAs (siRNAs), and Piwi-interacting RNAs 
(piRNAs), which play important roles in biological pro- 
cesses such as gene expression, gene silencing, and RNA 
processing [2]. In addition, there are many classical es- 
sential ncRNAs, including ribosomal RNAs (rRNAs), 
small nuclear RNAs (snRNAs), and tRNAs. Some of 
these RNAs are known to undergo post-transcriptional 
modifications [3-5]. Experimental results have shown 
that deficiencies in RNA-modifying enzymes lead to em- 
bryonic death in mice, and the loss of rRNA modifica- 
tion leads to developmental defects in zebrafish, which 
signifies the importance of RNA modifications for the 
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proper functioning of ncRNAs [6,7]. Although many 
modification sites have been identified [8], the functions 
of these modifications remain unknown. 

Small nucleolar RNAs (snoRNAs) play key roles in the 
RNA modification process. These RNAs function as 
guide RNAs for the site-specific modification of target 
RNAs such as rRNAs and snRNAs [9]. Over the last 
decade, a large number of snoRNAs have been identified 
experimentally or computationally in various species 
[10,11]. These RNAs are encoded by three types of gen- 
omic loci, i.e., intronic gene loci, polycistronic gene loci 
(clusters), and monocistronic gene loci (independent) 
[9]. The snoRNA genes of different loci must be 
expressed in different ways but in a coordinated manner. 
For example, for the maturation of human 28S rRNA, 
98 distinct snoRNA genes need to be expressed simul- 
taneously from 65 independent loci. It is still unclear 
how the expression of these snoRNAs is regulated in a 
synchronized manner. 

We have constructed the snoRNA orthological gene 
database (snOPY) as a tool for studying RNA modifica- 
tions and snoRNA gene evolution. This database provides 
comprehensive information about snoRNAs, snoRNA 
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gene loci, and target RNAs. In addition, it includes manu- 
ally curated orthologous gene data for each gene. This 
unique database enables users to analyze not only 
snoRNAs but also their targets and gene organization in 
various species. 

Database content 

snOPY provides three main types of information: 
snoRNA, snoRNA gene locus, and target RNA (Table 1). 
As of October 2013, it contains 13,770, 10,345, and 133 
records of snoRNAs, snoRNA gene loci, and target RNAs, 
respectively. 

snoRNA 

The major function of snoRNAs is to guide the modifi- 
cation of rRNAs or snRNAs via antisense RNA:RNA 
interactions with their target RNAs (Figure 1). snoRNAs 
are divided into two major classes based on highly con- 
served motifs, i.e., the C/D and H/ACA boxes [9]. The 
C/D box snoRNAs contain two sequence motifs (C box: 
TGATGA; D box: CTGA) and direct the 2'-0-methyla- 
tion of their target RNAs. In these snoRNAs, a region 
upstream of the D or D' box is complementary to the 
target RNA, and the modification occurs 5 nt upstream 
of these boxes (Figures 1 and 2) [12]. The H/ACA box 
snoRNAs also contain two sequence motifs (H box: 
ANANNA; ACA: ACA box) and guide the pseudo- 
uridylation (conversion of uridine to pseudouridine) of 
the target RNA. The modification site is located at the 

Table 1 snOPY statistics 



Classification No. 

Records 

Species 34 

snoRNA gene 13,770 

Gene locus 10,345 

Target RNA 133 
Box type 

C/D 4,795 

H/ACA 7,913 

H/ACA, C/D 2 

Unclassified 1,060 
Gene locus 

Intronic 2,539 

Polycistronic 473 

Monocistronic 7,333 
Target RNA 

Ribosomal RNA (rRNA) 101 

Small nuclear RNA (snRNA) 32 



Numbers include both curated and noncurated data. As of October 2013, the 
number of curated snoRNA gene entries is 2,024. 



pseudouridylation pocket, which is formed by an RNA: 
RNA antisense interaction between complementary se- 
quences of the snoRNA and target RNA (Figure 1) [13]. 
The snoRNA data were collected from public databases 
according to the sequence annotation and manually 
curated. 

Gene locus 

There are three types of snoRNA gene loci: intronic, 
polycistronic, and monocistronic [9,14]. In intronic loci, 
the snoRNA gene is located within the intron of protein- 
coding or non-protein-coding genes (host gene) and 
transcribed simultaneously with its host gene under the 
control of the host gene promoter. The maturation of 
snoRNA transcripts is achieved via the splicing and sub- 
sequent processing of the host gene. In the animal king- 
dom, most snoRNA genes are expressed from introns 
[14]. The polycistronic loci contain multiple snoRNA 
genes that are organized into a cluster and transcribed 
from a single promoter, whereas the monocistronic loci 
contain a single snoRNA gene that is expressed from its 
own promoter. In plants and yeast, most of the snoRNA 
genes exhibit either polycistronic or monocistronic 
expression [15,16]. 

Target RNA 

rRNAs and snRNAs are the major targets of snoRNAs. 
In general, the number of modified nucleotides depends 
on the length of the target RNA. For example, human 
28S rRNA and U2 snRNA contain 119 and 13 modifica- 
tion sites, respectively. However, there are many orphan 
snoRNAs whose targets remain to be determined. 

Orthologue 

snOPY also contains information about snoRNA ortho- 
logues. The identification of the orthologues using com- 
mon homology search techniques such as BLAST is 
difficult because the sequence conservation between 
snoRNAs from different species is very low (Figure 2). 
Although there are some short conserved motifs, BLAST 
often fails to identify the correct counterparts. There- 
fore, we focused on the sequence conservation between 
the target RNAs such as rRNAs rather than the snoRNA 
sequences themselves to identify the orthologues. We 
performed sequence alignment of the target RNAs from 
different species using ClustalW [17], then mapped the 
modification sites on that alignment. If the modified nu- 
cleotide is aligned at the same position, we assumed the 
snoRNA that guides this modification as an orthologue. 

Utility and discussion 

snOPY provides several search parameters, including 
species, box motif, target RNA, gene organization, cur- 
ation status, and keywords. Users can also perform a 
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Figure 1 Secondary structure of snoRNAs and genomic loci. Three types of snoRNA gene loci (top), intermediate transcripts (middle), and 
mature box C/D and box H/ACA snoRNAs associated with target RNAs (bottom) are shown. Circles indicate modification sites for methylation (m) 
and pseudouridylation (Y). snoRNAs, snoRNA gene loci, and target RNAs are shown in red, gray, and blue, respectively. 



BLAST search for the gene sequences, gene loci, and tar- 
get RNAs (Figure 3 A, 3B). In addition, search results are 
visualized using "Locus View", which enables users to 
compare the snoRNA locus directly between various 
species (Figure 3C). 

Each snoRNA entry page provides basic information 
about the locus, including the snoRNA gene sequence, 
type of box motif, and genomic position (Figure 3D). In- 
formation relating to the gene locus and target RNA is 



also provided, and these items are linked to more de- 
tailed descriptions (Figure 3E). Users can retrieve 
orthologues and perform multiple sequence alignments 
via this page (Figure 3F). The locus entry pages show 
schematics of the locus structure and sequence, as well 
as other information about the locus (Figure 3G). The 
target RNA entry pages show complete RNA sequences 
and modification sites (Figure 3H). When available, the 
snoRNAs involved in these modifications are also 



methylated site (ft^ 

3 ' -agggcgtcgcggtcaagAcgaatggttttca-5' 
I I I I I I I I I I I " 

H. sapiens -TCTCGTGATGAAA ACTCTGTCCAGTTCTGCTACTGA-AGGGAGAGAGATGAG-AGCCTTTTAGGCTGAG GAA 70 

P. troglodytes —TCTCGTGATGAAA ACTCTGTCCAGTTCTGCTACTGA-AGGGAGAGAGATGAG--AGCCTTTTAGGCTGAG GAA 70 

T.belangeri —TCTCGTGATGAAA ACTCTGTCCAGTTCTGCTACTGA-AGGGAAAGTGATGAA--AGCCTTTAATGCTGAG GAA 70 

S. trideceml ineatus - -TCTTGTGATGAAA ACTCTGTCCAGTTCTGCTGCTGA-AGGGAAAGAGATGAA--TGCCTTTAATGCTGAG GAGG 71 

O.garnettii CTTCTTGTGATGAAAAAA-ACT--GTCCAGTTCTGCTACTGA-AGGGAGAGAGATGAA--AGCTTTTAATACTGAG GAAG 74 

M.murinus CTTCTTATGATGAAA GCTTAGTCCAGTTCTGCTACTGA-AGGGAGAGAGATGAA--AGCTTTTAATGCTGAA GAAG 73 

5. araneus CTTCTTGTGATGATA ACTCTGTCCAGTTCTGCTACTGA-AGGGAAAGCGATGA — AGCCTATAGATCTGAG GAAG 72 

G.gal lus CTTCTAATGATGATA CTTCTGTCCAGTTCTGCTACTGA-AGGGAGAGCGATGAC--A-CTTGTGATGCTGAG GAAG 72 

O.anatinus CTTC AC GTGATGATA ATATTGTCCAGTTCTGCTACTGA-AGGGACAGCGGTGAC--ACCCTTAGAATCTGAA GAAG 73 

G.aculeatus - TTTCTATGATGATA ACTTTGTTCAGTTCTGCTACTGA-TTGCA-AATGGTGAT--AATACG-ACACCTGAG TAAG 70 

0. latipes -TTTCAGTGATGATA ACTTTGACCAGTTCTGCTACTGA-ATGGA-AGTGGTGAT--ATT-CA-AGTTCTGAG AAA 68 

T. rubripes - TTTCTGTGATGATA ACTTTGTCCAGTTCTGCTACTGA-ATGAA-AGTGGTGAT--AGCAAA-GACTCTGAG AAA 69 

D. rerio -TTTCTGTGACGATA ACTTTGTCCAGTTCTGCTACTGA-AATATAAGTGATGC — AGTTTAAGACTCTGAG GAAG 71 

X.tropicalis CTTC TGATGATA ACCTTGTCCAGTTCTGCTACTGA-AACTAT-GCGATGAT--ATTTCT-GAATCTGAA GAAG 68 

C.porcellus -TCTTAATGATGAAA ACTTTGTCCAGTTCTGCTACTGA- - CTTCAAGTGCTGAT- -AAAGTAT- -ATCTGAC AAGA 69 

M. mulatto -TCTCAGTGATGAAA ATTTTGTCCAGTTCTGCTACTGA- - CATTAACTGATGAT- -AAAGTAT- -GTCTGAG AAGA 69 

M.musculus -TCTCGGTGATGAGA ACTTTGTCCAGTTCTGCTGCTGATCTCTTAAGTGAGGAT--GAAGT-T--ATCTGAG GAGA 70 

R.norvegicus -TCTCAGTGATGAGA ACTTTGTCCAGTTCTGCTGCTGA-CTTCTAAGTGAGGAT--GAAGTGT--ATCTGAG GAGA 70 

E. cabal lus -TCTCAGTGATGAAA ACTTTGTCCAGTTCTGCTGCTGA- - CTTGAAGTGATGAT- -AAAGTCT- -ATCTGAG AAGA 69 

C. elegans GAATCGGTGATGTGA TATCCAGTTCTGCTACTGA— GTTATTGTGAAGATTAACTTTCCCCGTCTGAG ATT 69 

P.pygmaeus -TCTCAGTGATGAAA ACTTTGTCCAGTTCTGCTACTGA- -CAGTAAGTGAAGAT--AAAGTGT- -GTCTGAG GAGA 69 

D. novemcinctus — CTCAGTGATGAAA ACTTAGTCCAGTTCTGCTACTGA- - CTGTAAGTGACAAT- -AAAGTATT-ACCTGAG GAGA 69 

A . thai iana — GGCAGTGATGA TTAAAA--CCAGTTCTGCTTCAGATAATTCCTGATCGAAGAAACTGTATACCAAAAAACTTCTGAGCC 77 

D.melanogaster - -TAACATGATGATG ATTT--TTCAGTTCTGCTACTGA--AGACAGTTGACGAAAGCAA--AAATACCAAAATCACTGAAA 73 

S.cerevisiae CTACA-ATGATGATAAAATTTACTATTCAGTTCTGCTTCTGAACCAAAATAATAGGAAGATAACCAATTTTACCAAAGCTCAAATCTGATT 90 

Box C Box D' Box C Box D 

Figure 2 A multiple sequence alignment of snoRNAs (SNORD38) from 25 species. Part of the target RNA sequence {H. sapiens) and 
modification site are also included. Box motifs and complementary sequences are highlighted in red and blue, respectively. The multiple 
alignment was generated by ClustalW [17]. 
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Figure 3 Representative snapshots of snOPY pages. A, search form; B, search results selected with "Homo sapiens"; C, results retrieved from 
"Locus View" using "RPL4" as a keyword; D, individual snoRNA entry page for H. sapiens SNORD18A, with box motifs and complementary 
sequences highlighted in red and green, respectively; E, orthologues retrieved using "list" in the human SNORD18A page; F, multiple sequence 
alignment for SNORD18A; G, snoRNA gene locus of the human RPL4 gene for SNORD18A; H, target RNA and modification sites for human 28S 
rRNA; I, an orthologue table for four representative species. With the exception of A and C, only a part of each page is shown in the snapshot. 



shown, with links to the individual snoRNA entry page. 
Users can access a list of all target RNAs via the "Target 
RNA" link at the top of each page (Figure 3A). 

The orthologues table page shows the orthologous re- 
lationships between snoRNA genes from various species 
(Figure 31). The default setting includes four selected 
species, Homo sapiens, Caenorhabditis elegans, Drosophila 
melanogasten and Saccharomyces cerevisiae, which are 
well studied and widely referenced species. Users can 



select any species for comparison and readily access the 
reference data from the default setting. 

At present, there exist several other databases for 
snoRNAs, including snoRNA-LBME-db [18], Yeast 
snoRNA Database [16], Plant snoRNA Database [19], 
and the sno/scaRNAbase [20]. These databases provide 
very useful information about the snoRNAs from par- 
ticular organisms. However, users are unable to compare 
the snoRNAs from various species. On the other hand, 
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snOPY provides data from a wide variety of species, 
which enables users to perform comparative analysis 
very efficiently. 

Availability and requirements 

snOPY is freely available on the web at http://snoopy. 
med.miyazaki-u.ac.jp. 
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